Make sure that you save it in the relevant folder.
Creating a New Quarto Document
Creating a New Quarto Document
Creating a New Quarto Document
Creating a New Quarto Document
This is how we render the qmd file into an HTML.
Creating a New Quarto Document
Creating a New Quarto Document
Creating a New Quarto Document
Creating a New Quarto Document
This is how we add a chunk of code.
This is where we only place R code.
Creating a New Quarto Document
Creating a New Quarto Document
Creating a New Quarto Document
Creating a New Quarto Document
Creating a New Quarto Document
Creating a New Quarto Document
Notice how the R code chunk starts
Creating a New Quarto Document
Creating a New Quarto Document
Notice how the R code chunk ends
Creating a New Quarto Document
Creating a New Quarto Document
Creating a New Quarto Document
Creating a New Quarto Document
Opening the File
We first remove what we had previously
# Remove all objects from memoryrm(list =ls())
Opening the File
To get the file path we simply go to the relevant folder
# This opens a file dialog to select your filefile_path <-file.choose()file_path
Opening the File
Once we have the path, we can now read the files:
# Defining Pathsfile_path <-"/Users/bgpopescu/Library/CloudStorage/Dropbox/john_cabot/teaching/research_workshop/essential_stats/week3/lab/data/"# Use file.path() to construct full pathlife_expectancy_df <-read.csv(file.path(file_path, "life-expectancy.csv"))urbanization_df <-read.csv(file.path(file_path, "share-of-population-urban.csv"))
What We Want to Do
Explanation
These are our datasets
Doing it
Calculating Average by Country Code
This is what this looks like in code for life_expectancy_df2:
#Step5:Identifying how to extract the variable namesnames(freq_table)
[1] "Var1" "Freq"
Creating a Bar Plot from a Frequency Table
Step2: Creating a frequency table
#Step6: Providing more intuitive namesnames(freq_table)[1]<-"life_exp_mean_rounded"names(freq_table)[2]<-"frequency"#Step8: Turning factor variables into numericfreq_table$life_exp_mean_rounded<-as.numeric(as.character(freq_table$life_exp_mean_rounded))str(freq_table)
'data.frame': 35 obs. of 2 variables:
$ life_exp_mean_rounded: num 38 43 44 45 46 47 48 49 50 51 ...
$ frequency : int 1 2 3 4 4 5 5 2 2 4 ...
names(freq_table)
[1] "life_exp_mean_rounded" "frequency"
Creating a Bar Plot from a Frequency Table
Step2: Creating a frequency table
Inspecting the new dataframe
freq_table
Creating a Bar Plot from a Frequency Table
Step3: Creating the Barplot
library(ggplot2)#Step9: Creating the barplotfig5<-ggplot(data = freq_table, aes(x=life_exp_mean_rounded, y=frequency))+geom_bar(stat="identity")+theme_bw()fig5
What do you think cause the dip in life expectancy?
Exploring Data
What is the year with the lowest life expectancy for the US?
What is the year with the lowest life expectancy for the US?
#Creating a new df with the USdf_us_after1900<-subset(df_us_uk_after1900, Entity=="United States")df_us_after1900$Year[df_us_after1900$life_exp_yearly==min(df_us_after1900$life_exp_yearly)]
[1] 1918
Exploring Data
What is the year with the second lowest life expectancy for the US?
#Arranging and creating a new dataframedf <- df_us_after1900 %>%arrange(life_exp_yearly)#Selecting the second lowestsecond_highest_life_expectancy <- df$life_exp_yearly[2]#Selecting the year with the second lowestdf$Year[df$life_exp_yearly==second_highest_life_expectancy]
[1] 1901
Exploring Data
What is the year with the second lowest life expectancy for the UK?
#Creating a new df with the UKdf_uk_after1900<-subset(df_us_uk_after1900, Entity=="United Kingdom")df_uk_after1900$Year[df_uk_after1900$life_exp_yearly==min(df_uk_after1900$life_exp_yearly)]
[1] 1901
Conclusion
What Did We Learn Today?
How to import and clean real-world data
The difference between histograms, bar plots, and line plots
How to use ggplot2 for data visualization
How to explore patterns and trends in life expectancy and urbanization